This is the University of Edinburgh PPLS Writing Centre workshop The aRt of the figure: Visualising your data using R. Feel free to browse the script beforehand if you’d like, but also don’t be intimidated by the amount of code: you will be guided through all the exercises in the workshop (except the bonus sections, which are meant as additional material to be explored on your own once you’ve learned the basics).

What you see below are the results of a “completed” script file. Once you do all the little exercises and complete all the bonus sections - and knit the R Markdown script into an HTML web page - this is more or less what you would see.

If you don’t have the script file (artofthefigure.Rmd) yet, go and download it




Recap: files, scripts and plotting functions

Note on plotting and generating artificial example data for plotting

  • The plot() command is a generic function that (often) automatically adopts to data type.
  • We will be using plot() and a few of its cousins, but their parameters (like col for color) usually all have the same names.
  • Arguments have names, but names can be omitted if using them in their intended order. You can always check the parameters of a function by executing help(functionname) or searching for the function by name in the Help tab on the right.
plot( c(1,4,10) ) # the basic plot command

# note: the c() command *c*ombines values into a vector; this will be useful later

We will be using some datasets built into R, as well as randomly generated data. It is not difficult to import your own data into R, but we’ll save time by not doing that. Importing data is covered in the last “bonus” section below if you’re interested.

runif(n = 1) # this command generates n random numbers
## [1] 0.1870263
sample(x = c(1,4,10), size = 1) # this produces a number of samples from the provided vector (x)
## [1] 10

Numerical data

Numerical values include things we can measure on a continuous scale (height, weight, reaction time), things that can be ordered (“rate this on a scale of 1-5”), and things that have been counted (number of participants in an experiment, number of words in a text).

A single numeric variable

We will use a built-in dataset called “iris” - it contains information about a bunch of flowers.

data("iris")  # load the data into the workspace (or "global environment").
# In RStudio, you can have a look at the dataframe by clicking on the little "table" icon next to it in the Environment section (top right).

# We can also inspect the data using R commands.
head(iris)     # prints the first rows
##   Sepal.Length Sepal.Width Petal.Length Petal.Width Species
## 1          5.1         3.5          1.4         0.2  setosa
## 2          4.9         3.0          1.4         0.2  setosa
## 3          4.7         3.2          1.3         0.2  setosa
## 4          4.6         3.1          1.5         0.2  setosa
## 5          5.0         3.6          1.4         0.2  setosa
## 6          5.4         3.9          1.7         0.4  setosa
summary(iris)  # produces an automatic summary of the columns
##   Sepal.Length    Sepal.Width     Petal.Length    Petal.Width   
##  Min.   :4.300   Min.   :2.000   Min.   :1.000   Min.   :0.100  
##  1st Qu.:5.100   1st Qu.:2.800   1st Qu.:1.600   1st Qu.:0.300  
##  Median :5.800   Median :3.000   Median :4.350   Median :1.300  
##  Mean   :5.843   Mean   :3.057   Mean   :3.758   Mean   :1.199  
##  3rd Qu.:6.400   3rd Qu.:3.300   3rd Qu.:5.100   3rd Qu.:1.800  
##  Max.   :7.900   Max.   :4.400   Max.   :6.900   Max.   :2.500  
##        Species  
##  setosa    :50  
##  versicolor:50  
##  virginica :50  
##                 
##                 
## 
# Plotting time! Let's see for example how long the petals are in the dataset
iris$Petal.Length       # the $ is used for accessing (named) column of a dataframe
##   [1] 1.4 1.4 1.3 1.5 1.4 1.7 1.4 1.5 1.4 1.5 1.5 1.6 1.4 1.1 1.2 1.5 1.3
##  [18] 1.4 1.7 1.5 1.7 1.5 1.0 1.7 1.9 1.6 1.6 1.5 1.4 1.6 1.6 1.5 1.5 1.4
##  [35] 1.5 1.2 1.3 1.4 1.3 1.5 1.3 1.3 1.3 1.6 1.9 1.4 1.6 1.4 1.5 1.4 4.7
##  [52] 4.5 4.9 4.0 4.6 4.5 4.7 3.3 4.6 3.9 3.5 4.2 4.0 4.7 3.6 4.4 4.5 4.1
##  [69] 4.5 3.9 4.8 4.0 4.9 4.7 4.3 4.4 4.8 5.0 4.5 3.5 3.8 3.7 3.9 5.1 4.5
##  [86] 4.5 4.7 4.4 4.1 4.0 4.4 4.6 4.0 3.3 4.2 4.2 4.2 4.3 3.0 4.1 6.0 5.1
## [103] 5.9 5.6 5.8 6.6 4.5 6.3 5.8 6.1 5.1 5.3 5.5 5.0 5.1 5.3 5.5 6.7 6.9
## [120] 5.0 5.7 4.9 6.7 4.9 5.7 6.0 4.8 4.9 5.6 5.8 6.1 6.4 5.6 5.1 5.6 6.1
## [137] 5.6 5.5 4.8 5.4 5.6 5.1 5.1 5.9 5.7 5.2 5.0 5.2 5.4 5.1
iris[, "Petal.Length"]  # this is the other indexing notation
##   [1] 1.4 1.4 1.3 1.5 1.4 1.7 1.4 1.5 1.4 1.5 1.5 1.6 1.4 1.1 1.2 1.5 1.3
##  [18] 1.4 1.7 1.5 1.7 1.5 1.0 1.7 1.9 1.6 1.6 1.5 1.4 1.6 1.6 1.5 1.5 1.4
##  [35] 1.5 1.2 1.3 1.4 1.3 1.5 1.3 1.3 1.3 1.6 1.9 1.4 1.6 1.4 1.5 1.4 4.7
##  [52] 4.5 4.9 4.0 4.6 4.5 4.7 3.3 4.6 3.9 3.5 4.2 4.0 4.7 3.6 4.4 4.5 4.1
##  [69] 4.5 3.9 4.8 4.0 4.9 4.7 4.3 4.4 4.8 5.0 4.5 3.5 3.8 3.7 3.9 5.1 4.5
##  [86] 4.5 4.7 4.4 4.1 4.0 4.4 4.6 4.0 3.3 4.2 4.2 4.2 4.3 3.0 4.1 6.0 5.1
## [103] 5.9 5.6 5.8 6.6 4.5 6.3 5.8 6.1 5.1 5.3 5.5 5.0 5.1 5.3 5.5 6.7 6.9
## [120] 5.0 5.7 4.9 6.7 4.9 5.7 6.0 4.8 4.9 5.6 5.8 6.1 6.4 5.6 5.1 5.6 6.1
## [137] 5.6 5.5 4.8 5.4 5.6 5.1 5.1 5.9 5.7 5.2 5.0 5.2 5.4 5.1
plot(iris$Petal.Length) # two observations: there is quite a bit of variation, and it seems there are clusters in the data

hist(iris$Petal.Length, breaks=25) # a histogram shows the distribution of values (breaks change resolution)

boxplot(iris$Petal.Length)         # a boxplot is like a visual summary()
points(x=rep(1, nrow(iris)), y=iris$Petal.Length) # could also add actual datapoints

# Here's something to try: the default color of the points is black. Change it to something else by adding the parameter col to the points command (remember, parameters are separated by commas, and they are given values using the = sign; color names must be in quotes, e.g., "red").
boxplot(iris$Petal.Length)
points(x=rep(1, nrow(iris)), y=iris$Petal.Length, col="red")

# Plot boxplots, grouped by the Species variable:
boxplot(iris$Petal.Length ~ iris$Species) # note the ~ notation
grid() # why not add a grid for reference

# A slightly nicer version:
boxplot(iris$Petal.Length ~ iris$Species, ylab="petal length", border=c("purple", "darkblue", "lightblue"), boxwex=0.5, cex=0.4) 
abline(h=1:7, col=rgb(0,0,0,0.1))  # adds vertical lines instead of full grid

A Note on the rgb(red, green, blue, alpha) function: this allows making custom colors; alpha controls transparency. Possible values range between 0 and 1 by default. Below is a piece of code that generates an example of how the color scheme works (don’t worry if you don’t understand the actual code, this is above the level of this workshop; just put the cursor in the code block and press CTRL+SHIFT+ENTER (CMD+SHIFT+ENTER on Mac).

Inspecting two numeric variables

This short workshop is not intended to cover statistical concepts like correlation, but we will nevertheless quickly look at plotting two variables against each other.

plot(iris$Sepal.Length, iris$Sepal.Width) # no interaction?

# Why not color-code by species. This method also makes use of the [ ] indexing.
iriscolors= c("purple", "darkblue", "lightblue")
plot(iris$Sepal.Length, iris$Sepal.Width, 
     col=iriscolors[iris$Species], pch=20) # pch sets the point type
legend("topleft", pch=20, legend = levels(iris$Species), col=iriscolors, cex=0.7, bty="n") # add legend

# Whoa... This is suddenly a lot of code out of nowhere. If some of it looks overwhelming, worry not, everything will become clear once you get into R a bit more. 

# Add detail to make this legible to the colour-blind, and printable in black-and-white
irispoints = c(15,16,17)  # see help(points) for more
plot(iris$Sepal.Length, iris$Sepal.Width, 
     col = iriscolors[iris$Species] , pch = irispoints[iris$Species])
legend("topleft", pch=irispoints, legend = levels(iris$Species), col=iriscolors, cex=0.7, bty="n")

# Make this publication-ready by adding proper labels
plot(iris$Sepal.Length, iris$Sepal.Width, 
     col=iriscolors[iris$Species] , pch=irispoints[iris$Species],
     main="Iris sepals in three species", xlab="Sepal length", ylab="Sepal width")
grid(col=rgb(0, 0, 0, 0.2))  # 0,0,0 results in black, 0.2 alpha makes it transparent (so, grey on a white background)
legend("topleft", pch=irispoints, legend = levels(iris$Species), col=iriscolors, cex=0.7, bty="n")

## Bonus: plotting regression lines
# This workshop does not cover actual statistical techniques, but in case you ever need to plot a regression line* - this is pretty simple in R:

# another iris plot
plot(iris$Sepal.Length, iris$Petal.Length, pch=irispoints[iris$Species])
grid(col=rgb(0,0,0,0.2), lty=1)

# do the regression analysis:
linmodel = lm(iris$Petal.Length ~ iris$Sepal.Length)
abline(linmodel, col=rgb(0,0,0,0.3), lwd=4) # abline can handle the output of the lm (linear model) command

# *Of course we already recognized that the data is more complex as initially thought (consisting of three distinct groups), so a proper regression model should take that into account.

Time series

While a whole subject on its own, we will have a quick look at plotting time series - data reflecting changes in some variable over time.

# This time we'll generate some random data and pretend it's real data.
# "The following data are reaction times to stimuli of one individual, over 100 trials, in an experiment on..."
retime = c(runif(20, 0,0.1), seq(0.1, 2.8,length.out = 80)) *  runif(100, 0.7, 1.1 )

# Have a look at the raw data first
head(retime)
## [1] 0.000195651 0.054098121 0.014333827 0.056923174 0.066771822 0.058447789
# Now let's plot it
plot(retime, ylab="reaction time" )

plot(retime, ylab="reaction time", type="l" )

What can you tell by the looks of the data?

Exercise: improve this plot by adding the type parameter and setting its value as “l” (which stands for line, which are more useful in this instance), and set the X axis label (xlab) to say “trials” instead of the default “Index”.

## Bonus: plotting multiple (although still artificial) time series.
# First we need to create more data; let's bind it into a matrix as well
retimes = rbind(subj1=retime, 
                subj2=sort(rnorm(100, 1,0.3))*runif(100,0.4,0.8), 
                subj3=sort(rnorm(100, 1,0.3))*runif(100,0.9,1.1) ) 
# rows are subjects, columns are trials
# create an empty plot beforehand:
plot(NA, ylim=c(0,3), xlim=c(0,100), ylab="reaction time", xlab="trials"); grid()
# ...then add a line per subject; [1,] indexes the first row, etc:
lines(retimes[1, ], col="darkred")
lines(retimes[2, ], col="darkblue")
lines(retimes[3, ], col="darkgreen")

Categorical data

Categorical/nominal/discrete values cannot be put on a continuous scale or ordered, and include things like binary values (student vs non-student) and all sorts of labels (noun, verb, adjective). Words in a text could be viewed as categorical data.

Categories and contingency tables

Here is another artificial dataset. Let’s pretend I went around Edinburgh and asked random people on the street the following question: “A new species of insect was recently discovered in Scotland, and they called it Boubicus Boubasus - or Bouba for short. What’s your intuition, is Bouba a big fat bug, or a small slim bug?” (and the same for Kikis Kikosius, or Kiki for short)

boubakiki = data.frame(
  meanings=c(
    sample(c("big", "small"),25,T, prob=c(0.8,0.2)), 
    sample(c("big", "small"),24,T, prob=c(0.3,0.7))
    ),
  words=c(rep("bouba",25), rep("kiki",24))
  ) # this command will create the random data

# Have a look at the raw data first.

# In addition to eyeballing the data, use the following commands: nrow(), dim()
nrow(boubakiki) # number of rows
## [1] 49
dim(boubakiki)  # dimensions: number of rows and columns
## [1] 49  2
# Now let's use the table() function to make sense of it:
bktable = table(boubakiki)
mosaicplot(bktable, col=c("orange", "navy"))  # a simple mosaic plot, displays proportions

barplot(bktable, ylab="big    small")         # a barplot, displays counts

## Bonus:
# You *could* also use heatmap(bktable), but a heatmap does not make much sense with just 2x2 categories - it's easier to just look at the actual table. To get an idea how it would look like with larger contingency tables (or things like similarity matrices), check out this plot (of some more random data): 
heatmap(matrix(runif(20^2), 20,20, dimnames=list(letters[1:20], letters[1:20])), symm=T)

## Bonus 2:
# If you ever find yourself dealing with a correlation matrix (a matrix of correlation values between a bunch of variables; usually via the cor() function in R), then you might be interested in the corrplot package. It is similar to heatmaps, but generates plots with correlation matrices in mind. Do an install.package("corrplot") and try the following
library("corrplot")
## corrplot 0.84 loaded
corrplot(cor(iris[,1:4]) )

Words!

install.packages("wordcloud") # install a "package", a collection of functions to extend R's basic functionality; this needs to be done only once for each package.
# Let's create an object with a bunch of text:
sometext = "In a hole in the ground there lived a hobbit. Not a nasty, dirty, wet hole, filled with the ends of worms and an oozy smell, nor yet a dry, bare, sandy hole with nothing in it to sit down on or to eat: it was a hobbit-hole, and that means comfort. It had a perfectly round door like a porthole, painted green, with a shiny yellow brass knob in the exact middle. The door opened on to a tube-shaped hall like a tunnel: a very comfortable tunnel without smoke, with panelled walls, and floors tiled and carpeted, provided with polished chairs, and lots and lots of pegs for hats and coats—the hobbit was fond of visitors. The tunnel wound on and on, going fairly but not quite straight into the side of the hill — The Hill, as all the people for many miles round called it — and many little round doors opened out of it, first on one side and then on another. No going upstairs for the hobbit: bedrooms, bathrooms, cellars, pantries (lots of these), wardrobes (he had whole rooms devoted to clothes), kitchens, dining-rooms, all were on the same floor, and indeed on the same passage. The best rooms were all on the left-hand side (going in), for these were the only ones to have windows, deep-set round windows looking over his garden, and meadows beyond, sloping down to the river."

# Now let's do some very basic preprocessing to be able to work with the words in the text:
clean = gsub("[[:punct:]]", "", sometext) # remove punctuation (that weird thing inside the gsub (R's find-and-replace command) is a regular expression; don't ask, it just works)
cleanlow = tolower(clean) # make everything lowecase
words = strsplit(cleanlow, split=" ")[[1]]
# Inspect the object we just created. It should be a vector of 232 words.

# Some ways to inspect and visualize textual data
sortedwords = sort(table(words)) # counts the words and sorts them
plot(sortedwords, xaxt="n")
axis(1, 1:length(sortedwords), names(sortedwords), las=2, cex.axis=0.5) # add the words

# Time to use the wordcloud package we installed earlier
library("wordcloud") # load the package (needs to be done again when you start R again)
## Loading required package: RColorBrewer
wordfreqs = as.numeric(sortedwords) # get the frequencies from the table object

wordcloud(words = names(sortedwords), freq=wordfreqs, min.freq = 0) # needs the wordcloud package to work

# Note: if R gives you errors (saying word x could not fit), ignore them. Also, if plots look strange after using wordcloud, use the dev.off() command to reset graphics.
## Bonus: further cleaning for nicer clouds
# Ideally we would remove stopwords (the, and...) before plotting things like wordclouds. There are packages that do that (tm, text2vec) in various ways, or you could write some code to do it yourself (using clever math to give greater weight to "context" words, removing stopwords using lists or regular expressions, etc).
# A very simple approach is to just remove all short words:
sortedwords2 = sortedwords[which(nchar(names(sortedwords))>4 ) ] # look up the help files of the commands used here if you'd like to understand how this works exactly
wordcloud(names(sortedwords2), as.numeric(sortedwords2), min.freq = 0, col=terrain.colors(10))
## Warning in wordcloud(names(sortedwords2), as.numeric(sortedwords2),
## min.freq = 0, : round could not be fit on page. It will not be plotted.

# terrains.colors() is a "palette", a function that generates a number of colors from a predefined set

Almost there!

That’s it for today.

Do play around with these things later when you have time, and look into the bonus sections for more cool stuff. If you get stuck, Google is your friend; also, check out www.stackoverflow.com - this site is a goldmine of programming (including R) questions and solutions. If you get into making more plots, you might also want to look into the ggplot2 package, which offers an alternative way of making plots, which some people find more intuitive than the basic R way.

Also, if you get around analysing your own data and need help in terms of writing about the results of your analyses and presenting your data, choosing the right graph type, what to plot, and in general how to plot the things you want to plot - feel free to book a session with me through the PPLS Writing Centre: go to http://writingcentre.ppls.ed.ac.uk/appointments/ -> Book an appointment -> look for my name (Andres Karjus) in the list of tutors, in the ‘Select staff’ dropdown menu.


But wait! There’s one more thing to do. Since this is an R Markdown document, we can “knit” it into a nice HTML (or PDF, or Word) report file - it will show both the code and the plots produced by the code. Note that unfortunately this will not work if you have errors in your code - marked by the little red x signs on the left side vertical bar. To knit, click the Knit button (with the little blue ball of yarn) above the script window. If the code is without errors, an HTML document will appear.




Bonus: things to try out at home.

Here are some more things you can try out at home later.

Small note: if you try knitting the RMarkdown file again later and would like to see output from the bonus sections, set eval=TRUE in these blocks, which will allow them to be rendered (all bonus blocks currently have the eval parameter set as FALSE). Do not set eval to TRUE in the small blocks with the install.packages() commands though. You might have also noticed the echo=F parameter - this just means the code itself will not be rendered in the knit output (even when it is executed).

Plotting social networks

# This block is for installing a package, leave eval=F
install.packages("igraph") # install a "package", a collection of functions to extend R's basic functionality; this needs to be done only once for each package.
install.packages("visNetwork") # install this one as well
# Feel free to set eval=T to render the blocks herein when knitting to html.
library("igraph", quietly = T, verbose = F)  # load the package; this needs to be done once after starting R/RStudio
## 
## Attaching package: 'igraph'
## The following objects are masked from 'package:stats':
## 
##     decompose, spectrum
## The following object is masked from 'package:base':
## 
##     union
# Create an object with some Scottish people (could be a sample in a sociolinguistic study):
scots=c("Angus","Archibald","Baldwin","Boyd","Cinead","Craig","Diarmid","Donald","Duncan","Eachann","Erskine","Ethan","Fergus","Fingal","Fraser","Hamilton","Iain","Irvine","James","Muir","Mungo","Owen","Raibert", "Lyall", "Margaret", "Mairi", "Morag", "Murdina","Rhona", "Sorcha", "Thomasina","Una") 
nscots = length(scots) # record the number of people in an object
mates = matrix(sample(0:1,nscots^2,T,prob=c(0.95,0.05)), ncol=nscots, nrow=nscots, dimnames=list(scots, scots)); diag(mates)=0 # this creates a randomized matrix signifying friendships; no need to think about this too hard for now
mates[1:10,1:10] # but have a look at it anyway; '1' means these two people know each other; this line prints the first 10 rows and 10 columns
##           Angus Archibald Baldwin Boyd Cinead Craig Diarmid Donald Duncan
## Angus         0         0       0    0      0     0       0      0      0
## Archibald     0         0       0    0      0     0       0      0      0
## Baldwin       0         0       0    0      0     0       0      0      0
## Boyd          0         0       0    0      0     0       0      0      0
## Cinead        0         0       0    0      0     0       0      0      0
## Craig         0         0       0    0      0     0       0      0      0
## Diarmid       0         0       0    0      0     0       0      0      0
## Donald        0         1       0    0      0     0       0      0      0
## Duncan        0         0       0    0      0     0       0      0      0
## Eachann       0         0       0    0      0     0       0      0      0
##           Eachann
## Angus           0
## Archibald       0
## Baldwin         0
## Boyd            0
## Cinead          0
## Craig           0
## Diarmid         0
## Donald          0
## Duncan          0
## Eachann         0
scotgraph = graph_from_adjacency_matrix(mates, mode = "undirected", diag=FALSE) # creates a graph object; igraph needs to be loaded

# Have a look at the scotgraph object (list of links/"edges"). Then plot the scotgraph object.

# OK, this produces a network, but it's not very nice looking...
# Let's modify the plotting parameters, and add M/F values for the sample
mf = c(rep("m", nscots-9), rep("f", 9)) # create a vector of labels
mfcolors = ifelse(mf=="m", yes="blue",no="tomato")
par(mar=c(0,0,0,0)) # makes margins more suitable for igraph plotting
plot(scotgraph, 
     vertex.size=4, vertex.color="lightgray", vertex.frame.color=NA,   # vertex color and size
     vertex.label.cex=0.9, vertex.label.dist=0.1, vertex.label.font=2, # vertex labels
     vertex.label.color=mfcolors,   # label color
     edge.color=rgb(0,0,0,0.3))

# Let's try something else. Using the same graph data, we'll recreate it using another package, visNetwork, which makes graphs interactable.
library(visNetwork, quietly=T, verbose = F)
scotgraph_v = toVisNetworkData(scotgraph) # converts the previous igraph object into a visNetwork object
scotgraph_v$nodes$color = mfcolors # let's use the same node colors
scotgraph_v$edges$color = "lightgray"
# This would create the graph using default settings: visNetwork(nodes = scotgraph_v$nodes, edges = scotgraph_v$edges)
#  but let's make it a bit nicer.
# You may notice that in the visNetwork help files, the examples use the magrittr package's %>% pipe notation. The example below does not in order to keep things simple, but feel free to explore magrittr, it makes writing sequences such as the one below much more elegant.
v = visNetwork(nodes = scotgraph_v$nodes, edges = scotgraph_v$edges, height = "500px")
v = visNodes(v, size = 10,  shadow=T)
v = visPhysics(v, enable=T, solver = "forceAtlas2Based", forceAtlas2Based = list(gravitationalConstant = -10)) 
v # calling the visNetwork object plots it. try clicking on the nodes, moving them, and zooming. Pretty neat, no?

X marks the spot: making maps in R

Making maps programmatically based on data would come in handy if your worked with demographic data, or dialects, areal sociolinguistics, etc.

install.packages("raster") # install another package
library("raster")
## Loading required package: sp
uk = getData("GADM", country = "United Kingdom", level = 2) # download UK map (needs the raster package to be loaded)
par(mar=c(0,0,0,0)) # change plot area for better visibility - mar defines the margins, which are usually used for the axes and labels (reset using dev.off() later)
plot(uk, lwd=0.4) # plot the UK
grid()            # add grid
points(-3.1833, 55.9533, pch=20, col="blue", cex=2) # your current location :)

plot(uk, lwd=0.4, xlim=c(-4,-1), ylim=c(55.7,55.8)); points(-3.1833, 55.9533, pch=20, col="blue", cex=2) # zoom in

Interactive plots in 3D!

A quick look at a package that creates interactive (clickable, rotatable, etc) plots, both in 2D and 3D - these also work in web pages (like the html file you could create by knitting this script file; R Markdown can also be used to create slides, meaning you could easily include interactive graphs in your next presentation).

install.packages("plotly") # install the package
library(plotly, quietly = T)
# another look at the iris data (note this uses magrittr's pipe %>% notation)
plot_ly(x= iris$Petal.Length,y=iris$Sepal.Width, z=iris$Sepal.Length, type="scatter3d",mode="markers",color=iris$Species) %>% layout(scene = list(yaxis=list(title="Sepal width"),xaxis=list(title="Petal length"), zaxis=list(title="Sepal.Length")))
# Remember the rgb color plots with the black background from earlier? Given there are 3 colors, they really would make more sense in 3D, no?
red=runif(1000);green=runif(1000);blue=runif(1000)
plot_ly(x=red,y=green, z=blue, type="scatter3d",mode="markers", color=I(apply(cbind(red, green, blue,0.5),1, function(x) rgb(x[1],x[2], x[3])))) %>% layout(paper_bgcolor='black', scene = list(xaxis=list(title="red"),yaxis=list(title="green"), zaxis=list(title="blue")))

Last but not least - getting your own data into R and getting plots out of R.

Once you get around to working with your own data, you’ll need to import it into R to be able to make plots based on it. There are a number of ways of doing that.

Table (csv, Excel, txt) into R, import from file

This is probably the most common use case. If your data is in an Excel file formal (.xls, .xlsx), you are better off saving it as a plain text file (although there are packages to import directly from these formats, as well as from SPSS .sav files). The commands for that are read.table(), read.csv() and read.delim(). They basically all do the same thing, but differ in their default settings.

# an example use case with parameters explained
mydata = read.table(file="path/to/my/file.txt", # full file path as a string
                    header=T,      # if the first row contains column names
                    row.names=1,   # if the 1st (or other) column is row names
                    sep="\t",      # what character separates columns in the text file*
                    quote="",      # if quotes in text in any columns, set this to ""
                    )
# * "\t" is for tab (default if you save a text file from Excel), "," for csv, " " if space-spearated, etc
# for more and to check the defaults, see help(read.table)
# the path can be just a file name, if the file is in the working (R's "default") directory; use getwd() to check where that is, and setwd(full/path/to/folder) to set it (or you can use RStudio's Files tab, click on More)

Importing from clipboard

There is a simple way to import data from the clipboard. While importing from files is generally a better idea (you can always re-run the code and it will find the data itself), sometimes this is handy, like quickly grabbing a little piece of table from Excel. It differs between OSes:

mydata = read.table(file = "clipboard")       # in Windows (add parameters as necessary)
mydata = read.table(file = pipe("pbpaste"))   # on a Mac

Importing text

For text, the readLines() command usually works well (its output is a character vector, so if the text file has 10 lines, then readLines produces a vector of length 10, where each line is an element in that vector (you could use strsplit() to further split it into words. If the text is organized neatly in columns, however, you might still consider read.table(), but probably with the stringsAsFactors=F parameter (this avoids making long text strings into factors, read up on it if needed).

Exporting plots

RStudio has handy options to export plots - click on Export on top of the plot panel, and choose the output format. Plots can be exported using R code as well - this is in fact a better approach, since otherwise you would have to click through the Export… menus again every time you change your plot and need to re-export. Look into the help files of the jpeg and pdf functions to see how this works.

Animated plots

There are a number of ways for creating animated plots in R and making nice GIFs that you can use in a talk, on your website or wherever. There is the animation package, or on a Mac you can use ImageMagick’s Terminal commands to convert plot files into a GIF (you can send commands to Mac’s Terminal using the system() command; learn about loops to easily generate a number of plots with only a few lines of code).

# a simple loop that generates 12 plots
par(mfrow=c(4,3), mar=c(2,2,0,0)) # sets the plotting window to accommodate multiple plots and reduces margins
x = runif(12)*runif(12)     # random numbers
for (i in 1:12){            # this is a for-loop
  plot(x[1:i], xlab="", ylab="", type="o", xlim=c(1,12), ylim=c(0,1))
}

# save these as .png files instead (use png() ) and then convert the pngs into gif

Anything else

There are also packages to import (and manipulate) images, GIS map data (shapefiles), data from all sorts of other file formats (like XML, HTML) and many more. Just google a bit and you’ll find what you need.

Final words on attributions, citing and references

R and its packages are all free open-source software, meaning countless people have invested a lot of their own time into making this possible. If you use R, do cite it in your work (use the handy citation() command in R to get an up to date reference, both in plain text and BibTeX). To cite a package, use citation("package name"). You are also absolutely welcome to use any piece of code from this workshop, but in that case I would likewise appreciate if you would cite it: